Skip to content

feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443

Open
congx4 wants to merge 8 commits into
mainfrom
cong/mapping-fast-path-on-list-fields
Open

feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443
congx4 wants to merge 8 commits into
mainfrom
cong/mapping-fast-path-on-list-fields

Conversation

@congx4
Copy link
Copy Markdown
Contributor

@congx4 congx4 commented May 18, 2026

What's in this PR

Today GET /_elastic/{index}/_mapping(s) calls list_fields over every published split. On indexes with hundreds of thousands of dynamic fields this can take several seconds, and over a certain threshold the leaf hits QW_FIELD_LIST_SIZE_LIMIT (100k by default) and the request fails.

  1. Timestamp pushdown — new ?start_timestamp=…&end_timestamp=… URL params on _mapping(s), forwarded into ListFieldsRequest verbatim. The metastore prunes the candidate split set by time window before any leaf fan-out. Unit is epoch seconds, half-open interval — matching the existing ListFieldsRequest proto contract.

  2. field_patterns — new ?fields=… URL param (comma-separated names). Push fields patterns to leaf_list_fields to reduce the workload(sort, merge).

Test plan

  • cargo build -p quickwit-serve — clean
  • cargo clippy -p quickwit-serve --tests — clean
  • cargo +nightly fmt --all -- --check — clean
  • cargo nextest run -p quickwit-serve --lib elasticsearch_api:: — 75 / 75 pass

🤖 Generated with Claude Code

@congx4 congx4 force-pushed the cong/mapping-fast-path-on-list-fields branch 2 times, most recently from fef38b8 to 5bab546 Compare May 18, 2026 23:55
@guilload guilload force-pushed the guilload/list-fields branch 2 times, most recently from e8a723e to 0cd3d6c Compare May 19, 2026 19:43
Base automatically changed from guilload/list-fields to main May 19, 2026 19:55
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/model/index_mapping_query_params.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/model/index_mapping_query_params.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/model/mappings.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
congx4 added 2 commits May 20, 2026 09:37
Two small additions to the ES-compat `_mapping(s)` endpoint that
together let downstream callers (e.g. Trino's ES connector) skip the
expensive `list_fields` scan in the common case.

Today `GET /_elastic/{index}/_mapping(s)` calls `list_fields` over every
published split. On indexes with hundreds of thousands of dynamic
fields this can take several seconds and runs into
`QW_FIELD_LIST_SIZE_LIMIT` (100k by default). This PR addresses both
pieces with no proto change:

1. Timestamp pushdown
   New `?start_timestamp=…&end_timestamp=…` URL params on `_mapping(s)`,
   forwarded into `ListFieldsRequest` verbatim. The metastore prunes
   the candidate split set by time window before any leaf fan-out.
   Unit is epoch seconds, half-open interval — matching the existing
   `ListFieldsRequest` proto contract.

2. Column-hints fast path
   New `?fields=…` URL param (comma-separated names). When every
   requested name is a flat literal (no `*`, no `?`, no `.`) declared
   in the union of the indexes' `doc_mapping`, the handler builds the
   response straight from the declared mapping, filtered to those
   names. No `list_fields` call, no split I/O.

   Anything else (wildcards, dotted paths, names not in `doc_mapping`)
   falls through to the full-mapping path: `list_fields` over the
   splits in the time range, full unfiltered mapping returned — same
   shape as today, just with the timestamp-pushdown optimization
   applied.

Notes:
- Unknown query params are silently ignored (no `deny_unknown_fields`)
  to stay compatible with standard ES clients that pass `pretty`,
  `ignore_unavailable`, `allow_no_indices`, etc.
- No proto change. Stays on existing `ListFieldsRequest`.
- `IndexMappingQueryParams` parser and the new
  `ElasticsearchMappingsResponse::from_doc_mapping_filtered` are
  unit-tested in their respective modules.
- rename `fields` query param to `field_patterns` to mirror
  `ListFieldsRequest.field_patterns`
- switch tests to `serde_qs` (already a workspace dep, matches the
  bulk-query-params test style)
- move the declared-field filter out of `mappings.rs` and into the
  rest_handler fast path: trim `field_mappings` in place before calling
  `from_doc_mapping`, dropping `from_doc_mapping_filtered` entirely
  (dynamic fields were already filtered at the leaves via
  `ListFieldsRequest.field_patterns`)
- nits in `parse_field_patterns`: trim/filter before allocating the
  owned String per token
- nits in `collect_declared_top_level_names`: functional flat_map style
@congx4 congx4 force-pushed the cong/mapping-fast-path-on-list-fields branch from 5bab546 to 2f719f6 Compare May 20, 2026 14:08
congx4 added 3 commits May 20, 2026 10:39
- drop the fast-path declared-field `retain(...)` in rest_handler.
  `field_patterns` is now hint-only: it triggers the fast path (skip
  `list_fields`) when every pattern matches a flat declared field, and
  is pushed down to the leaves for dynamic-field filtering. Both fast
  and slow paths now return the full declared schema, matching slow-
  path semantics that existed before.
- remove unused `serde_urlencoded` dev-dep from `quickwit-serve` and
  the workspace `Cargo.toml` (was already unused after switching tests
  to `serde_qs`).
- `collect_declared_top_level_names`: switch back to a procedural form
  preallocated with `HashSet::with_capacity(sum)` to signal the upper
  bound — no filtering, no explosion.
The fast path (skip list_fields when all field_patterns are declared flat
fields) will be implemented in pomsky. This commit reduces the handler to
always call root_list_fields, forwarding field_patterns, start_timestamp,
and end_timestamp as pushdown hints.
@congx4 congx4 requested a review from guilload May 21, 2026 14:44
@congx4 congx4 marked this pull request as ready for review May 21, 2026 14:45
@congx4 congx4 requested a review from a team as a code owner May 21, 2026 14:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/quickwit-oss/quickwit/blob/b208d212bd3d9863f6713812a0e8c8f1199d4276/quickwit-serve/src/elasticsearch_api/model/index_mapping_query_params.rs#L31-L32
P2 Badge Honor the advertised fields parameter

The new mapping fast path is described as being enabled by ?fields=..., but this query struct only deserializes field_patterns; because unknown params are explicitly ignored here, requests using GET /_elastic/{index}/_mapping?fields=host will leave field_patterns as None, fan out to every dynamic field, and can still hit the list-fields size/latency problem this change is meant to avoid. Add a serde alias/rename for fields or update the API contract and callers accordingly.


https://github.com/quickwit-oss/quickwit/blob/b208d212bd3d9863f6713812a0e8c8f1199d4276/quickwit-serve/src/elasticsearch_api/rest_handler.rs#L241-L244
P2 Badge Surface invalid field patterns instead of dropping dynamics

Passing user-supplied patterns into ListFieldsRequest without validating them means inputs such as field_patterns=a** hit FieldPatterns::from_strs in the leaf and return an InvalidArgument, but this handler later converts any root_list_fields error to None via .ok(). In that scenario the endpoint returns 200 with only declared mappings and silently omits all dynamic fields, so validate the patterns before dispatching or propagate this specific error as a bad request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Member

@guilload guilload left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a bunch of nits.

Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs Outdated
Comment thread quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs
- move parse_field_patterns into IndexMappingQueryParams::field_patterns()
- return Vec<String> directly instead of Option<Vec<String>>; caller no
  longer needs unwrap_or_default
- use filter_map to trim and drop empties in one pass
- split final Ok(...) into let response + Ok(response)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 010d7f26f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread quickwit/quickwit-serve/src/elasticsearch_api/model/index_mapping_query_params.rs Outdated
Callers using the documented `?fields=` parameter (e.g. the Trino ES
connector) would get an empty ListFieldsRequest.field_patterns, causing
leaves to enumerate all dynamic fields. Adding the serde alias makes
both `?fields=` and `?field_patterns=` map to the same field.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d9a120b4d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

field_patterns: Vec::new(),
start_timestamp: None,
end_timestamp: None,
field_patterns: params.field_patterns(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate invalid field pattern errors

When fields/field_patterns contains a pattern that the list-fields layer rejects (for example an ES-style pattern with more than one *, which FieldPattern::from_str returns as InvalidArgument), this forwards the user input into root_list_fields but the handler still converts any error to None with .ok() below. In that scenario the request succeeds with only declared mappings and silently drops all matching dynamic fields, making the new filtered mapping endpoint return incomplete results instead of reporting the bad filter.

Useful? React with 👍 / 👎.

InvalidArgument from root_list_fields (e.g. a pattern with more than
one wildcard) now propagates as an ElasticsearchError instead of being
swallowed by .ok(). Other failures (infrastructure, timeout) still
degrade gracefully to a declared-only mapping response.
@congx4 congx4 force-pushed the cong/mapping-fast-path-on-list-fields branch from 28cf128 to 46e8ce9 Compare May 21, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants